4.1 Hypothesis Testing Introduction

1 Hypothesis Testing

Assume we have model P={Pθ:θΘ} (as usual, may be nonparametric). We have two competing hypothesis about θ:

Here Θ0Θ1=,Θ0Θ1=Θ (Θ0,Θ1 are disjoint).

A hypothesis is called simple if it fully specifies data distribution (like θ=θ0 in the above examples).
Now we want to use the data XPθ to determine whether H0 or H1 is true.

2 The Test Function

We describe a test by its critical/test function ϕ(x)={0,accept H0,γ(0,1),reject H0 with probability γ,1,reject H0.

γ part is to "top off" the Type I error rate if T(X) is discrete and Pθ0(T(X)>cα)<α. In practice, we will skip the random γ part.

Therefore we partition X into rejection region R={xX|ϕ(x)=1}, and acceptance region A={xX|ϕ(x)=0}.
We will define a test statistic T(X) and some critical threshold cR. So ϕ rejects for large T(X), if ϕ(x)={0,T(x)<c,γ(0,1),T(x)=c,1,T(x)>c.
Or ϕ rejects for extreme T(X), if ϕ(x)={0,|T(x)|<c,γ(0,1),|T(x)|=c,1,|T(x)|>c.

2.1 Significance Level, Power

Inevitably deduction produces error. There are two types of error we make:

Our goal is to minimize type II error as small as we can, while control type I below a pre-specified value α[0,1].
Define power function β(θ)=Eθ[ϕ(X)]=Pθ(reject H0).
We can explicitly combine power function with type I/II error:

Type I Error Type II Error
θΘ0 βϕ(θ) 0
θΘ1 0 1βϕ(θ)

So our goal can be expressed as maxϕβϕ(θ),θΘ1,s.t.βϕ(θ)α,θΘ0.
We say ϕ is a level α test if supθΘ0βϕ(θ)α. (if strictly below α, we say it is conservative). We commonly use α=0.05.

2.2 Example: Z test

Assume we observe Z(X)N(θ,1). We use the right-tailed test ϕ1(z)=1{z>zα} that rejects for large Z. Here zα=Φ1(1α) is the upper α quantile of the N(0,1) distribution (top xx%).

If we want to test the two-sided hypothesis, we use two-tailed test ϕ2(z)=1{|z|>zα/2}.
#?

3 Optimal Testing

3.1 Likelihood Ratio Test

We start with the simplest case: (3.1)H0:XP0 vs H1:XP1. WLOG assume P0,P1 have densities p0,p1 w.r.t common dominating measure μ.
Define LR(X)=p1(X)p0(X).

LRT

Likelihood ratio test (LRT) is ϕ(x)={1,LR(x)>c,γ,LR(x)=c,0,LR(x)<c.

Theorem (Neyman-Pearson Lemma)

Likelihood ratio test ϕ with E0ϕ(X)=α maximizes power among all level α tests of (3.1).

We should set cα as the upper- α quantile of the distribution of LR(X), i.e. P0(LR(X)>cα)<αP0(LR(X)cα). So we can set γ=αP0(LR(X)>c)P0(LR(X)=c).

By discreteness, we can't exactly have Type I error be α=0.05. For example, n=100,α=0.05, take cα=58 (0.95 quantile under the null). But P0.5(X>58)=0.044<0.05. So we randomize at the boundary by setting γ=0.050.044P0.5(X=58)=0.26, and ϕ rejects with probability 0.26 if X=58.
But in practice we seldom use randomized tests. So consider the conservative test ϕ(X)=1{X>58}.

3.2 UMP Tests

UMP Tests

We say a test ϕ is uniformly most powerful (UMP) level α test of H0 against H1 if

  • it is a valid level α test;
  • for any other valid test ϕ, βϕ(θ)βϕ(θ), θΘ1.
MLR

P={Pθ|θΘR} has monitone likelihood ratios (MLR) in T(X) if pθ2(x)pθ1(x) is a non-decreasing function of T(x), for any θ1<θ2.

MLR is sufficient for finding a UMP test:

Theorem

Assume P has MLR in T(X), and H0:θθ0 vs H1:θ>θ0, for some θ0ΘR.
If ϕ(X) rejects for large T(X), then ϕ is UMP at level α=Eθ0ϕ(X).